Model Selection

Low-latency processing

# Low-latency processing

Erax WoW Turbo V1.1

A Whisper Large-v3 Turbo speech recognition model optimized for Vietnamese, supporting multiple languages with ultra-fast response and high accuracy

Speech Recognition

Transformers Other

Erax WoW Turbo V1.0

A Whisper Large-v3 Turbo speech recognition model optimized for Vietnamese, supporting real-time transcription in multiple languages

Speech Recognition

Transformers Other

VITA-1.5 is a multimodal interaction model designed to achieve GPT-4o level real-time vision and voice interaction capabilities.

Speaker Diarization V1

This is a speaker segmentation model based on powerset multi-class cross-entropy loss, capable of processing 10-second mono audio and outputting speaker segmentation results.

Speaker Analysis

Chester Bennington RVC 1000 Epochs

This is a model based on RVC (Real-time Voice Conversion) technology, specifically designed to convert input speech into Chester Bennington's vocal style.

Speech Synthesis

Wsj0 2mix Skim Small Causal

This is a speech enhancement model trained based on the ESPnet framework, specifically designed for speech separation tasks in the wsj0_2mix dataset.

Audio Enhancement English

Ai Light Dance Stepmania Ft Wav2vec2 Large Xlsr 53 V5

Automatic speech recognition model based on wav2vec2-large-xlsr-53, fine-tuned on the GARY109/AI_LIGHT_DANCE dataset

Speech Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase